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Abstract —Random Linear Network Coding (RLNC) provides a theoretically efficient method for coding. Some of its practical drawbacks 
are the complexity of decoding and the overhead due to the coding vectors. For computationally weak and battery-driven platforms, 
these challenges are particular important. In this work, we consider the coding variant Perpetual codes which are sparse, non-uniform 
and the coding vectors have a compact representation. The sparsity allows for fast encoding and decoding, and the non-uniform 
protection of symbols enables recoding where the produced symbols are indistinguishable from those encoded at the source. The 
presented results show that the approach can provide a coding overhead arbitrarily close to that of RLNC, but at reduced computational 
load. The achieved gain over RLNC grows with the generation size, and both encoding and decoding throughput is approximately one 
order of magnitude higher compared to RLNC at a generation size of 2048. Additionally, the approach allows for easy adjustment 
between coding throughput and code overhead, which makes it suitable for a broad range of platforms and applications. 
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1 Introduction 

Network Coding (NC) is a promising paradigm (T| that 
has been shown to provide benefits in many different 
networks and applications. NC enables coding at indi¬ 
vidual nodes in a communication network, and thus is 
fundamentally different from the end-to-end approach 
of charmel and source coding. With NC, packets are 
no longer treated as atomic entities since they can be 
combined and re-combined at any node in the network. 
This allows for a less restricted view on the flow of in¬ 
formation in networks, which can be particularly helpful 
when building distribution systems for less structured 
networks such as meshed, peer-to-peer or highly mobile 
networks. 

In this work, we focus on random NC approaches, 
i.e. RLNC (2|, and disregard deterministic coding. The 
reason is that our primary interest is cooperative and 
highly mobile wireless networks, which fit perfectly with 
the highly decentralized nature of RLNC. In particular, 
RLNC reduces the signaling overhead and increases 
robustness towards changing charmel conditions in the 
network. At the same time, it allows for the construction 
of much simpler distribution systems, which is desirable 
from an engineering point of view. 

Unfortunately, RLNC is inherently computationally 
demanding that has spawned several efforts to produce 
optimized implementations and modify the underlying 
code O, [4J. Even though several solutions and im¬ 
plementations have been declared to provide sufficient 
coding throughput continued efforts are valid as they 
can ensure higher coding throughput. Computational re¬ 
sources can be conserved tasks, such as video decoding, 
and the energy consumption introduced by coding can 
be reduced further. This is of particular importance when 
NC is deployed on battery-driven devices with modest 


computational capabilities. 

This paper presents our work on applying Perpetual 
Codes, which was suggested and named in the unpub¬ 
lished draft [5j, for NC systems. The encoding is sparse 
and non-uniform which allows for fast decoding as fill- 
in 0 is avoided while recoding is still possible. The 
approach presented here is similar to what is called 
a smooth perpetual code in fSl, but with two significant 
differences, neither zero padding nor a pre-code is used. 
This simplifies the analysis, but complicates the final 
decoding step. We describe how encoding and decoding 
can be performed and analyze the overhead and the 
complexity. We verify our results with our own C-i-i- 
implementation, which also provides practical through¬ 
put results. Furthermore, we describe, implement, and 
evaluate recoding which was not considered in |5J. The 
main insight from our results is that RLNC is a better 
choice at low to medium generation sizes, but perpetual 
codes are more suitable at medium to high sized gen¬ 
erations. Hence, perpetual codes are not a substitution, 
but a supplement to RLNC. 

This paper is primarily intended for researchers and 
developers who work with reliable data distribution on 
wireless and mobile platforms. Therefore we provide a 
short overview of RLNC and related work in Section |2 
The approach to encoding, decoding and recoding is 
presented in Section together with algorithms aimed 
at implementations in C or C-i-i-. Section |4] provides 
analysis of the performance of the code in terms of 
decoding complexity and code overhead and compares 
measurements results obtained from our implementation 
with the analytical expressions. Readers primarily inter¬ 
ested in theoretical results and familiar with Forward 
Error Correction (FEC) and RLNC could therefore skip 
Section |2] as well as some parts of Section |3l 
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2 RLNC AND Related Work 

When data is distributed from one or more sources to 
one or more sinks using RLNC, then it is encoded at the 
sources to produce coded symbols and coding vectors 
that describe the encoding procedure. Together a coded 
symbol and a coding vector form a coded packet. When 
a sink has received enough coded packets, it can decode 
the original data. Additionally, received symbols can be 
recombined and thus recoded, at any relaying nodes in 
between the sources and sinks. 


Existing header Coding vector Coded symbol 


Coded packet 


For practical reasons, the original data is typically 
divided into generations [7f of size g. We denote the data 
in such a generation as M. This ensures that coding 
can be performed over data of any size, and that the 
performance of RLNC is independent of the data size. 
Each generation is divided into symbols, denoted as 
rrii, and these symbols are then combined at random to 
create a coded symbol, denoted as Xi. As all operations 
are performed over a Finite Field (FF) F,, the code is 
linear, and thus new valid coded symbols can be created 
from coded and non-coded symbols. Fig. [T] illustrates 
how the original symbols can be combined at random 
and provide an endless stream of coded symbols. The 
original data can be decoded by inverting the coding 
operations performed on the coded symbols. See [8], fl] 
for introductions to FF and RLNC. 


M 



Fig. 1: Coded symbols are created from the original data. 


Dividing the data into generations reduces both the 
computational work and the decoding delay. Unfortu¬ 
nately, it also introduces the need for additional signal¬ 
ing lioii, m, as each of the generations must be decoded 
successfully before the original data is recovered fully. 
It also increases the probability that the sink receives 
linearly dependent symbols which adds to the overhead 
of the code. This overhead is well understood for net¬ 
work typologies where (it can be assumed that) symbols 
are only received from sources that hold the original 
information fTZl - lfMl . In such systems, the parameters of 
the code can be chosen so that the overhead tends to zero 
and can be ignored. However, these parameters present 
a trade-off where higher values will generally result in 
lower code overhead but lower coding throughput [4j. 


The coding throughput also depends on less determinis¬ 
tic parameters, e.g. the hardware platform, programming 
language, and implementation optimizations fT4ll - IIT8| . 
Therefore a universally optimal set of values carmot be 
identified, as they depend on the system and on the 
target platform. 

Some simplifications that can increase the coding 
throughput of RLNC are binary, systematic, and sparse 
variants 1141 . I191- 12TI . Binary codes are in widespread 
use and can obtain a low code overhead. They can be 
fast as operations in the binary field can be performed 
in parallel by all modern computers. Using a systematic 
code comes with no cost in term of overhead, and can 
potentially provide a high gain in both encoding and 
decoding throughput. Unfortunately, it is not possible 
to us this approach at every node, but only at the 
sources. Thus there is no or little gain if recoding is 
performed, which is the main reason to use RLNC in 
the first place. Using a sparse random code provides 
similar benefits and drawbacks as a systematic code. It 
becomes impractical to perform recoding, and the gain in 
decoding throughput can be small or non-existing 122 1. 

Alternatively, the underlying code can be fundamen¬ 
tally modified or replaced to ensure a lower decoding 
complexity. A noteworthy suggestion is to use a convolu¬ 
tional code as the underlying code f23]l , 124 1 as they have 
been used in communication systems for many years. 
These efforts are still primarily theoretical as to the best 
of our knowledge currently no implementation of con¬ 
volutional codes for NC exists. The work on perpetual 
codes |5j is related to this work, since it uses a related 
fundamental concept, combined with a concatenated 
approach similarly to Raptor codes 1251 . However the 
authors aim was to propose a cache-friendly rateless 
erasure code, and they did not consider recoding, which 
is a necessary feature when used in a system that exploits 
NC. We note that linear block codes and convolutional 
codes may in some cases be equivalent, as they can 
describe codes with similar realizations using different 
terminology l26l , | |27) . 

Another direction in the search for improved trade-off 
between computational work and code overhead was 
suggested in 128). Here the authors considered coding 
over several generations, called a random annex code [HJ, 
1291 . Each generation is extended to include symbols 
from other generations and thus when a generation is 
decoded these extra symbols are released. This reduces 
the problem of ensuring that all generations are decoded, 
and thus the overhead. At the same time, it is less com¬ 
putationally demanding as the decoding is performed in 
an inner and an outer step. The approach is very useful 
for file transfers, but less so for streaming as the final 
decoding delay is high as generations are not decoded 
sequentially. Additionally, the problem of how recoding 
could be performed has so far not been considered. 
Note that the idea of a random annex can be applied to 
many underlying codes, including the perpetual code 
considered in this work. 
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3 Code Operation 

This section introduces the code and the three opera¬ 
tions, encoding, decoding and recoding, that can be per¬ 
formed at nodes in the network. The notation used for 
analysis and algorifhms is listed in Table [T] In vectors 
and matrices, we denote the first element with index 
zero. In some algorithms, the value —1 is used to denote 
non-valid or non-existing. 


TABLE 1: Notation used for analysis and algorifhms 


Symbol 

Definition 

9 

Generation size 

<? 

Field size 

w 

Coding vector width 

Fg 

A finite field with q elements 

9 

Coding vector with g elements, starting at element 0 

G 

Matrix containing all received coding vectors 

X 

Coded symbol 

X 

Matrix containing all received symbol 

h 

Local recoding vector 

Gi 

The ith row of the coding matrix 

Gi,j 

The index in row i and column j of the coding matrix. 

Xi 

The ith row of the symbol matrix 

Xi,j 

The index in row i and column j of the symbol matrix. 

P 

Local variables for pivot indices 

7 

A randomly drawn integer. 


In RLNC, fhe elements in the coding vector g are 
drawn completely at random, and thus each coded 
symbol is a combination of all the original symbols in 
one generation. This is not the case for fhe perpefual 
approach fhat we consider in fhis work. Insfead, an 
elemenf with index p is chosen as the pivot and the 
following w elements are drawn at random from F^. We 
denofe w as the width of fhe coding vecfor. See Fig. |2] 
for a small example of some resulting coding vectors. 


w 


1 

7o,i 

70,2 

70,3 






1 

71,2 

71,3 

7i,4 






1 

72,3 

72,4 

72,5 






1 

73,4 

73,5 

73,6 






1 

74,5 

74,6 

74,7 

75,0 





1 

75,6 

75,7 

76,0 

76,1 





1 

76,7 

77,0 

77,1 

77,2 
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Fig. 2: All possible coding vectors, when g = 8 and w = 
3. The 7 's denote randomly drawn elements from Fg. 


3.1 Encoding 

The data to be transmitted from the source is divided 
into generations, we denote the data in such a generation 
M. Each generation is divided into g symbols that are 
represented with one or more FF elements in Fg. The 
symbols are combined as specified by the coding vector 
g in order to create coded symbols x. 

x = M g (1) 

The construction of a coding vector g and the corre¬ 
sponding coded symbol x is described by Algorithm [T] 


Algorithm 1: encode 
Input: M 
1 g ^ 0 

z p mod g) 

3 Sp ^ 1 

4 for i G {p,p + w ] do 

5 L 9) ^ 

6 X <— M ■ g 

7 return g, x 


An index in the generation is drawn at random and 
used as the pivot, p G [0,g). The index in g that 
corresponds to this pivot element is set to one. For 
the subsequent w indices in g, an element is drawn at 
random from Fg. The remaining elemenfs in g are zeros. 
The resulting coding vecfor is of fhe form illusfrafed in 
Fig. 121 To creafe a coded symbol, fhe coding vecfor is 
multiplied with the data, x — M g. Together, the coding 
vector g and coded symbol x form a coded packef. 

It is trivial to represent the coding vector in a very 
compact way. Each coding vector can be represented 
by an index and w scalars. The necessary bits for fheir 


P 

Si 

S2 




represenfation is given by Equation (HJ. The index can 
take g values and each of the w elements can take q 
values. 

Is I = log 2 (g) -h w • log 2 ((?) [bits] (2) 

Coding vectors can be generated in slightly different 
ways depending on how p is drawn and the size of 
w, see Table El The systematic mode does not produce 
coding vectors of the specified form, buf we include if 
for completeness. 

TABLE 2: Different encoding modes. 

Mode p drawn w 

Random random € [0, g) 0 < w < g 

Sequential sequentially looping from 0 to g-1 0 < w < g 

Systematic sequentially from 0 to g-1, subsequently tu = 0 
drawn at random € [0, g) 
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3.2 Decoding 

A node that receives coded packets can decode the 
original data by collecting the coded symbols in X and 
the coding vectors in G. The original information, M can 
be found as in Equation provided that G is invertible 
and thus has full rank. 

M = X-G~^ (3) 

To decode the original data in M, G must be reduced 
to identity form by performing basic row operations that 
are simultaneously performed on M. When it is not 
possible to fully decode a symbol upon reception, then 
it is partially decoded, and stored for later processing. 
This is referred to as on-the-fly decoding. When enough 
symbols have been received so that G has full rank, all 
received symbols can be fully decoded and the original 
data can be retrieved, we refer to this as final decoding. 

Unlike RLNC and Sparse Random Linear Network 
Coding (SRLNC), this perpetual approach defines the 
location of the non-zero values in the coding vector. 
This makes it possible to decode symbols efficiently 
and without the problematic fill-in that can be observed 
during decoding and recoding SRLNC 1221 . 

3.2.1 On-the-fly Decoding 

When a new coded packet arrives, its coding vector 
is inserted into the decoding matrix iff. it has a pivot 
candidate that was not previously identified. We distin¬ 
guish between pivot and pivot candidate as the element 
that is used as the pivot may only be found during 
the final decoding. Otherwise the previously received 
symbol with the same pivot candidate is subtracted from 
the new symbol, and the pivot candidate of the new 
symbol is changed. This is repeated imtil a new pivot 
candidate is identified. If the coding vector is reduced 
to the zero vector, the symbol is discarded. 

In Lig. |3l we assume that three coded packets have 
been received and their coding vectors have been in¬ 
serted into the decoding matrix. The pivot candidates 
of the received packets are zero, one, and seven, respec¬ 
tively. Subsequently, a coded packet with pivot candidate 
zero is received. This is denoted with a filled circle 
and arrow pointing to the coding vector of the packet 
in the left hand-side matrix. A symbol with the same 
pivot candidate have already been identified. Therefore, 
the existing row zero is subtracted from the incoming 
packet. This is denoted with the arrow pointing left into 
the left hand-side matrix. The element that initially was 
the pivot candidate is now zero and an element to the 
right has now become the pivot candidate. This step is 
repeated for the new pivot candidate and row one is 
subtracted from the incoming packet and element two 
becomes the pivot candidate. As this pivot candidate 
was previously not identified, the coding vector is in¬ 
serted into the decoding matrix, which is marked with 
orange and the arrow pointing right into the decoding 
matrix. 


A special case is when the on-the-fly phase causes the 
pivot candidate to wrap around to the start of the coding 
vector. An example of this is illustrated in Lig. 01 The 
incoming packet has pivot candidate seven for which a 
pivot candidate has already been identified in G. Thus 
row seven in G is subtracted from the incoming packet. 
If the last element in the coding vector is reduced to the 
zero vector, the first element in the vector is considered 
next and becomes the pivot candidate. In this case, the 
resulting coding vector has a zero at index seven and 
thus the pivot candidate is now index zero. The packet 
is then further reduced similarly to the example in Lig. [S] 


Algorithm 2 : forwardSubstitute 
Input: g, x 

1 while g 7 ^ 0 do 

2 p pivot (g) 

3 if Gp 7 ^ 0 then 

4 9 ^ 9 ■ ^ © Gp 

5 a; ^ a; • — 0 Xp > substitute into new packet 

I_ 

6 else 

7 Gp^ g - 

8 Xp <— X ■ — t> insert new packet 

9p 

9 return p 

10 return —1 


In Algorithm |2l the existing row with the same pivot 
candidate is substituted into the received symbol, imless 
the received coding vector has been reduced to the zero 
vector. If a new pivot candidate is identified, then the 
coding vector and symbol are inserted into the respective 
matrices. Importantly, this algorithm guarantees that w 
will not increase during decoding. 

The coding vector can be reduced to the zero vector if 
it is a linear combination of previously received coding 
vectors. It is possible to end in a dead-lock where a 
sequence of rows is repeatedly subtracted from the new 
packet. To avoid this, decoding should be terminated 
after some attempts and the packet discarded. Lrom 
practical experiments, it has been determined that de¬ 
coding can be terminated after 2g or 3g iterations. To 
avoid wasting operations on such cases, row operations 
can first be performed on the coding vector and then 
repeated on the coded symbol [22j. In both cases, the 
overhead arises because the symbol is a linear combina¬ 
tion of already received symbols. 

A simple optimization in cases where the w of the 
incoming packet is lower than the w of the held symbol 
with the same pivot candidate, is to simply swap these 
two to guarantee that w is never increased. Our current 
implementation does not support this and we leave 
it to future work to test whether this increases the 
decoding throughput. However, previous experiments 
showed that such optimizations can introduce a high 
cost in terms of bookkeeping Il22l . 
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Fig. 3: On-the-fly decoding of a received coded packet. The right hand-side matrix is the decoding matrix G. The 
left hand-side matrix shows the coding vector of the incoming symbol as it is decoded. The y's denotes random 
field elements. The filled circle and arrow indicate the original coding vector of the incoming packet. The straight 
lines indicate which rows are substituted into the coding vector. The arcs indicate the decoding steps. 
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Fig. 4: On-the-fly decoding similar to Fig. |3 but the pivot candidate wraps around the end of the decoding matrix. 


3.2.2 Final Decoding 

When a pivot candidate has been identified for all rows, 
final decoding is performed by forward substitution and 
backwards substitution. Initially, the decoding matrix 
has a form similar to that shown in Fig. [Sa] Note that 
some of the elements might be zero. It should also 
be noted that even though a pivot candidate has been 
identified for all rows, this does not guarantee that the 
decoding matrix has full rank. Therefore it is important 
to perform the final decoding in a way that ensures that 
the decoding matrix is not left in a state where future 
decoding becomes impossible or problematic. 

To bring the matrix onto echelon form, forward sub¬ 
stitution is performed on the non-zero elements in the 
lower left corner of Fig. |5al When forward substitution 
is performed on the first column, non-zero elements 
can be introduced in the lower w rows and further 
substitution becomes necessary as illustrated on Fig. 
After the forward substitution step, the decoding matrix 
is brought onto echelon form in Fig. |53 

The final forward step in Algorithm |3] ensures that 


the decoding matrix is always left in a valid state even 
if partial final decoding occurs. This happens in cases 
where it turns out that the decoding matrix does not 
have full rank, even though a pivot candidate for each 
row was identified. 

In Algorithm |3l a pivot element must be defined for 
each coluirm (i). If no such pivot element can be found, 
then it means that none of the received symbols can 
be used to decode the corresponding row, and we need 
to receive additional symbols. Therefore we traverse all 
the rows (j) from the diagonal and down, as we know 
that for all rows above a pivot index have already been 
identified. When we find a row for which the current 
pivot element is non-zero, we swap it with the correct 
row if it is not already at the correct position. We then 
forward substitute into the rows below. If we iterate to 
the last row without identifying a pivot element, then 
we carmot decode the target row, and we discard the 
symbol that is incorrectly located on row i. However, 
we do not want to discard useful symbols, therefore we 
check if the coding vector for row i is equal to the row 
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(a) Initial (b) During final forward substitution (c) After final forward substitution 

Fig. 5: The decoding matrix G at various states of the final decoding. The dotted part of fhe arrows indicafe rows 
where no substitution is needed. The arching arrows show how the pivot candidate moves towards the diagonal. 


Algorithm 3: finalForward 


1 for 

2 

3 

4 

5 

6 


f e [0,... ,5) do 
for j G \i,.. .,g) do 

if Gj-i ^ 0 then 
a i = j then 

Gi <r- Gi ■ 


> normalize 


7 ii i ^ j then 

8 Gi fA Gj • ^ 

9 Xi ^ Xj ■ > normalize and swap 


10 

11 

12 

13 


for k € [max(_ 7 ' + 1,^ — w),g) do 
if Gk,i / 0 then 
Gk Gk 0 Gi 

Xk Xk ^ Xi > substitute down 


14 

15 

16 

17 

18 

19 

20 
21 


break > found a pivots skip to the next 
else 

if j = { 9 - 1) then 
if f 7^ (g — 1) then 
if Gi 7^ Gi-)-i then 
Gi+i Gi+i 0 Gi 
Xi+i f— A^i+i © Xi 

0 add to below symbols 


22 

23 


Gi <— 0 

Xi ^ 0 > discard symbol 


below, if nof we simply add it to the row below. If the 
two symbols were identical, the result would be the 0 
vector and we would discard two rows instead of one. 
Finally, if we are looking for fhe pivof element for the 
final column, there are no rows below our target row, 
and we simply discard without checking. In this way, 
the decoding matrix will always be brought as close to 


echelon form as possible. 

If fhe rank of fhe matrix is full after the forward 
substitution, then standard backward substitution is per¬ 
formed fo bring fhe decoding mafrix fo reduced echelon 
form and decode fhe original data. 


Algorithm 4: finalBackward 

1 for i G (g,..., 0) do 

2 for 7 e (i,..., max(i — w, 0)1 do 

3 if Gj,i + 0 then 

4 Gj i — Gj 0 Gi ■ Gjj 

5 Xj i — Xj 0 Xi • Gjj 


Starting from the bottom, all rows are used to remove 
any remaining non-zeros in the rows above. Note that 
for each index, if is only necessary to inspect the above 
w rows as all other rows are guaranteed to be zero due 
to the form of the decoding matrix. Algorithms |2]l4] can 
be combined to create the decoder in Algorithm |5l 


Algorithm 5: decode 
Input: g, x 

1 if forwardSubstitute(p, a;) 7 ^ —1 then 

2 if rank(G) = g then 

3 |_ finalForward() 

4 if rank(G) = g then 

5 |_ finalBackward() 

6 return rank(G) 


When a new packet arrives, it is first forward sub- 
sfituted. If a new pivot element is identified, fhe coding 
vector and the coded symbol are inserted into the decod¬ 
ing matrix. When the rank of the decoding matrix is full, 
final decoding is attempted using forward substitution. 
This might initially fail, buf when it succeeds final 
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backwards substitution is performed and the original 
data in the generation is decoded. 

3.3 Recoding 

When two or more coded or non-coded symbols have 
been received, they can be combined by recoding. This 
can be described by Equation l|4|l and (|5j where the col¬ 
lected coding vectors and coded symbols are combined 
as defined by h of length g', where g' is the number of 
received symbols. Then x and g together form a recoded 
packet. 

g = Gh (4) 

x = X h (5) 

In classical RLNC, coded packets are accumulated and 
recoding is performed as a separate operation which 
results in a significant computational load, we denote 
this type of recoding active recoding. As explained in 
1211 , this form of recoding is not suitable when the 
code is sparse, because the recoded symbol will become 
denser with high probability j6|, 1221 . To combat this 
problem, we introduce a new type of recoding called 
passive recoding. 

3.3.1 Active Recoding 

Combining all collected packets completely at random, 
as in standard RLNC, results in recoded packets where 
the non-zero elements are no longer confined to w 
elements. If we instead pick packets that have similar 
pivot elements, then in the worst case the resulting coded 
packet will only have slightly more non-zero elements w' 
than that of the original coding vectors. This decreases 
the freedom in recoding, but allows us to maintain the 
sparsity in recoded packets. Unfortrmately, such an ap¬ 
proach significantly increases the complexity of recoding 
as it introduces a search for an appropriate set of coding 
vectors. Additionally, it is more deterministic than the 
standard recoding approach, and thus great care must 
be taken to avoid generating more linearly dependent 
symbols. 

3.3.2 Passive Recoding 

When on-the-fly decoding is performed, previously re¬ 
ceived symbols are subtracted from an incoming symbol 
to partially decode it. This combining of packets can also 
be considered as recoding and therefore the operations 
can be reused in order to reduce the computational load 
of recoding. 

If the operations performed on the received symbols 
are tracked, a symbol where a sufficient number of op¬ 
erations have been performed can be used as a recoded 
symbol. One way is to keep a list for each received 
symbol, to record what symbols are substituted into the 
symbol. However, this could become unfeasible if g is 
high. It is simpler to hold an integer for each symbol 
that is used to count the number of other symbols that 


have been substituted into the symbol. It is important to 
remember that during decoding we attempt to decode 
the symbols, therefore symbols that have been reduced 
too much should not be used as recoded symbols directly. 
We note that this passive approach can also be used for 
other codes. 

3.3.3 Active pius Passive Recoding 

To combine the two types of recoding we can monitor 
the passive recoding. If some neighboring set of packets 
combined meet our criteria for row operations, we can 
combine these by actively recoding them and thus obtain 
a recoded symbol. With this hybrid approach, we can 
recode symbols whenever we need them and still reduce 
the computation work associated with recoding. 

3.3.4 Re-encoding 

When a receiver has decoded a generation, it can encode 
packets the same way as the original source. This is 
sometimes referred to as recoding, which we believe 
is misleading, and instead denote this re-encoding to 
distinguish this from encoding at the original source. 

3.3.5 Implementation 

In our implementation we have chosen to implement 
a simple version of the active plus passive recoding. The 
primary reason is to reuse the operations performed 
during decoding and at the same time allow recoding 
to be performed when and as much as desired. Another 
important consideration is to avoid introducing a deter¬ 
ministic behavior when recoding. 


Algorithm 6: recode 

Input: G, X 

1 if rank(G) = 0 then 

2 return —1 > no symbols available 

3 p ■<— (? mod g) 

4 while Gp^p = 0 do 

5 p <— (? mod g) > find pivot index 

6 hp i — 1 

7 for i G {p,p w ] do 

8 if Gi^i ^ 0 then 

9 L i mod g) ^ (? niod q) > draw w elements 

10 g G ■ h 

11 5 X • h [> perform recoding 

12 return g, x 


First, row indices are drawn at random until a row that 
is non-zero is identified. Then for each of the following 
w rows that are non-zero, a random coefficient is drawn 
which defines the recoding vector h. The remaining 
indices in h are zeros. The new coding vector and coded 
symbol are then computed as g = G ■ h and x — X ■ h, 
respectively. 






This approach ensure that the width of the recoded 
vector w' < 2w. However, it is worth observing that if 
a symbol with a higher w is received, the size of w can 
be reduced during the forward substitution. This would 
happen more frequently if the width of the received and 
the existing row is compared as mentioned in Section lT2l 

4 Analysis and Experiments 

In this section we present analytical and experimental 
results on the code overhead, complexity and through¬ 
put. To verify the analytical expressions, we have im¬ 
plemented the proposed code in C++ Il30l . This also 
provides us with the possibility to report on encoding 
and decoding throughput which is a more interesting 
parameter that defines the computational load at the 
coding nodes. The current implementation is well tested 
and we believe that it provides a good trade-off between 
simplicity and throughput. As the code is available 
under a research friendly license, we encourage sugges¬ 
tions that can improve the throughput or simplify the 
implementation. 

4.1 Overhead 

The code overhead depends on the field size, density, 
generation size and possibly other factors. From stan¬ 
dard RLNC, we have a lower bound for the code over¬ 
head as defined in Equation (| 6 ), see m- The same lower 
bound holds here, as the lowest overhead is obtained 
when w = 5 — 1 , in which case the perpetual code 
becomes identical to RLNC. 

Equation (| 6 j evaluates the expected overhead based 
on the probability that the rank increase at the receiver 
when a new coded symbol is received. This is a function 
of the generation size, g, the field size, q, and the rank at 
the receiver, g'. Eor each of the indices where the decoder 
has already identified a pivot element, the coefficient in 
the incoming packet is reduced to zero by the decoder. 
In the best case, the remaining g — g' elements can be 
considered as drawn at random from Fg. Hence the 
probability that these are all zero and the packet is 
linearly dependent is l/g®”® . The mean overhead is 
then calculated as the sum of the expected amount of 
overhead for the decoding of each packet, for all possible 
ranks of the decoder. Note that the overhead is primarily 
due to the last packets, and that it becomes negligible for 
high values of q. 



Eor a symbol to be independent, either its pivot or 
one of the w coefficient must hit a new pivot element. 
The pivot of the symbol is independent and hence the 
probability is i, but the w elements depend on the 


pivot. The probability that one of these w elements 
hits an uncovered pivot is ^ where r' = [l,g — 1 ]. 
The expected number of tries to hit an unseen pivot is 

thus Y.l’=g ( 7 ) =9- Ef=o Thus the probability 

that one of the w elements hits an unseen pivot can 
be expressed as w/ (^g ■ X]r=o ' Then the probability 
that a symbol is covered when x coded symbols have 
been received can be found as the probability that none 
of the X coded symbols covers the symbol. In the worst 
case, decoding is possible when all g pivots are covered. 


Fx{x) > 
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(7) 


The resulting cdf can be used to calculate an up¬ 
per bound for the code overhead by evaluating the 
corresponding survival function (sf), which defines the 
probability that there is an uncovered symbol after x 
transmissions and thus additional transmissions are nec¬ 
essary. 

00 00 

/3<^5;c(x) = ^l-Fx(a:) (8) 

x=g x=g 

a < O < a -\- P 

In Eig. m the overhead for different generation sizes is 
plotted as a function of the code width (shown on the 
x-axis). The resulting overhead is given on the y-axis is. 
The dotted lines denote the upper and lower bounds, 
respectively. 



Eig. 6 : Code overhead as a function of g and w. The 
dotted lines denote the lower and upper bounds, respec¬ 
tively. 

Eor each generation size, the overhead decreases as the 
width increases until the width is sufficiently high and 
the overhead becomes indistinguishable from the lower 
bound. If the width is decreased below the sufficient 
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level, the overhead increases significantly. Therefore, val¬ 
ues of w below this point should generally not be used. 
The bounds are loose for low values of w, but become 
tighter as w increases. Thus the provided bounds are 
useful for identifying a value of w that is sufficiently 
high. 

We note that these results do not follow the overhead 
as a function of the density defined in 121 TI , which is 
similar to the width for this code. This is not surprising 
as the code investigated here is significantly less random 
compared to the sparse RLNC considered in the refer¬ 
ence. 


4.2 Complexity 

We express the computational complexity using a com¬ 
pound metric called row multiplication-addition, where 
a multiplication-addition is multiplying a row with a 
scalar and adding or subtracting it to or from another 
row. Here we only consider the binary field, there¬ 
fore the multiplication scalar is always one and a row 
multiplication-addition is simply adding or subtracting 
a row to or from another row. 

To encode a single packet, the expected number of row 
operations is given by Equation Q. We start with an 
empty vector and first add the chosen pivot row to it. 
For each of the following w indices, the corresponding 
row is multiplied with a random element from Fg and 
added to the new row. The probability that a randomly 
drawn element from Fo is non-zero is 1 — 

Q q 

1 -I- w • (1 - i) (9) 


During on-the-fly decoding, forward substitution is 
performed on the incoming symbol until a new pivot 
candidate is identified or the symbol is reduced to the 
0 vector. Forward substitution continues as long as the 
next element is non-zero, probability 1 ~ and has not 
already been identified as a pivot, probability 1 — 
where r is the current rank of the decoding matrix. The 
expected number of row operations for a generation is 
found by summing over fhe reciprocal for all values of 
r, from which the expected number of operations per 
symbol is found by dividing with g. 




( 10 ) 


the last w elements are non-zero, hence Equation |[TT]|. 
Then the bottom w rows are brought onto echelon form. 
Equation (|T^ accounts for fhe forward substitution step 
in the bottom right w x w submatrix. To include the 
probability that an element in Fg is equal to zero, we 
multiply with “ g) divide by g to find the 

operations per packet, which is rewritten as ( j. 


^forwardl ^ 
^forward2 ^ 


g-1 

q-g 

g-i 

q-g 

g-i 

q-g 


■ (g-w) -w 

w — 1 

•E* 

i=l 

W ■ {w — 1) 


( 11 ) 


( 12 ) 


To finalize the decoding, a similar procedure is per¬ 
formed, but this time upwards. Each of the g — w bottom 
rows are substituted into the w rows directly above 
them. Thus the number of operations is exactly the 
same as in Equation ifTTll and Equation lfT2ll and we 
obtain Equation (fldll . 


^ ^ ^fly ~t“ 2(iforwardl “t“ 2(iforward2 



{w {2g-w - 1 )) 


(13) 


Fig. [7| shows the upper bound and measured number 
of row multiplication-additions performed to decode one 
generation, both during the on-the-fly and final decoding 
phase. The operations during the two phases are stacked 
to show the total number of row operations. 


200 


175- 


E 150 


D. 125- 


100 



128 512 

Generation size and width 


2048 


Fig. 7: Mean row operations per decoded symbol. 


For the upper bound for fhe final decoding, we con¬ 
sider the worst case where most scalars are non-zero, 
see Fig. [Sa] The final forward stage on Fig. can be 
considered in two steps. First, the bottom w rows are 
reduced, by substituting the top g — w into them, so only 


The analytical expressions for the on-the-fly and final 
decoding provide good bounds for the measured results, 
especially when w is sufficiently high. For low values of 
w, the bound is less tight, but such settings should not 
be used when the code overhead is considered. 
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These values can be compared with traditional RLNC 
where the expected number of operations to decode a 
packet is approximately 5/2 for the binary case l22ll . Thus 
the reduction in complexity compared to RLNC grows 
as g increases. 

4.3 Throughput 

A low complexity does not guarantee a low compu¬ 
tational load and therefore we investigate the coding 
throughput. This is due to the complexity introduced 
by the algorithms that determine how decoding is per¬ 
formed, the quality of their implementation and the 
bookkeeping they add. The architecture also affects the 
throughput due to cache misses, memory delay and 
throughput, which can also be influenced by fhe memory 
access pattern. 

For each setting, data was encoded and subsequently 
decoded on the same machine. Each setting was run for 
a minimum of 30 minutes to reduce the deviation. 


TABLE 3: Specifications of the test machine. 


Model 

Dell Optiplex 790 DT 

CPU 

Intel Core 17-2600 @ 3.40GHz, 8192 KB L2 cache 

Memory 

16GB DDR3 1333 MHz Dual channel 

Chipset 

Intel Q65 Express 

OS 

64bit Debian Wheezy 

Compiler 

GNU G++ 4.6 


To provide a comparison, we have performed bench¬ 
marks of our RLNC implementafion using the same 
values of g as for the perpetual code. The encoding and 
decoding throughput's are listed in Table |4] along with 
the corresponding gains over RLNC. 

As expected, the throughput for both encoding and 
decoding decreases as g and w increase. The gain in¬ 
creases for higher values of g which corresponds wifh 
fhe analytical results. The highest gain in encoding and 
decoding throughput is observed at the highest tested 
generation size of 2048, and approximately eleven and 
nine times that of RLNC respectively. It should also 
be noted that using an excessively high w should be 
avoided as it decreases the throughput without reducing 
the code overhead. Additionally, a higher w increases the 
size of fhe coding vecfor representation, see Equation ||2ll, 
which adds to the overall overhead. 

To make a fair comparison wifh RLNC we must con¬ 
sider both the overhead and the complexity / through¬ 
put simultaneously, as the performance of the perpetual 
code is a trade-off between overhead and speed. As the 
lower bounds are the same as for RLNC we can never 
hope to achieve a lower code overhead. However, we can 
achieve a similar overhead but at lower computational 
complexity. For this reason the throughput's for the 
perpetual approach is marked for fhe lowest value of 
w where fhe code overhead is similar fo RLNC. 

Ideally, all decoding should be performed on-the-fly as 
this decreases the final decoding delay and distributes 


TABLE 4: Measured encoding and decoding throughput 
for RLNC and the proposed perpetual code. Through¬ 
puts are reported for different generation sizes, g, and 
in the case of the perpetual code at different widths, w. 
The lowest tested value of w where fhe perpefual code 
provides a similar code overhead as RLNC is marked. 
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w 

Overhead 

[packets] 

Encoding 

[MB/s] 

Gain 

[%] 

Decoding 

[MB/s] 

Gain 

[%] 


6 

8.24 

2883.91 

156 

2879.99 

147 


8 

4.15 

2512.14 

123 

2034.45 

74 

32 

12 

1.70 

1915.58 

70 

1359.66 

16 

16 

1.65 

1506.90 

34 

1214.37 

4 


24 

1.62 

1080.19 

-4 

881.03 

-25 


RLNC 

1.61 

1126.16 

- 

1167.67 

- 


12 

17.06 

1612.36 

441 

1209.09 

326 


16 

6.05 

1309.09 

339 

951.04 

235 

128 

24 

1.65 

951.04 

219 

620.30 

119 

32 

1.64 

742.68 

149 

526.43 

86 


48 

1.63 

520.06 

74 

360.34 

27 


RLNC 

1.61 

298.28 

- 

283.69 

- 


24 

24.33 

747.29 

921 

490.50 

655 


32 

7.02 

612.69 

737 

392.39 

504 

512 

48 

1.68 

449.39 

514 

273.00 

320 

64 

1.65 

354.48 

384 

228.40 

251 


96 

1.63 

249.25 

241 

167.27 

157 


RLNC 

1.61 

73.17 

- 

64.99 

- 


48 

36.01 

314.79 

1656 

203.11 

1321 


64 

9.03 

263.36 

1369 

167.82 

1074 

2048 

96 

1.66 

198.93 

1010 

129.93 

809 

128 

1.64 

160.38 

795 

99.90 

599 


192 

1.62 

115.00 

541 

66.81 

368 


RLNC 

1.61 

17.93 

- 

14.29 

- 


fhe processing load evenly. At the same time, decoding 
should be performed in such a way that fill-in does not 
occur as this reduces the amount of work necessary 
to decode Ibj. In our presented results, the ratio of 
operations performed during on-the-fly phase is low, 
see Fig. [71 Forfunately, fhe structure of the code makes it 
possible to perform somefhing fhat can best be described 
as opportunistic backwards substitution. Our tests with 
this approach show that most of the decoding operations 
can be performed when symbols are received. However, 
fhis algorithm is more difficult to analyze and due to 
space constraints we have omitted it. 

We note that the implementation does not take advan¬ 
tage of mulfiple cores. This could however be exploited 
by encoding multiple streams simultaneously or encod¬ 
ing simultaneously from different blocks of fhe same 
data set. 

4.4 Recoding 

In general, the performance of fhe proposed recoding 
approach will depend on the network topology, there¬ 
fore general results are difficult to obtain. Instead, we 
consider the simplest multi-hop topology to provide 
fundamental insights into the recoding performance of 
fhe scheme. 

Source A fransmifs data to R with some erasure 
probability. Both R and B send a single bit of feedback. 
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namely when they have achieved full rank. When a 
symbol arrives at R, it is forwarded to B and to correct 
the CRB erasures on average (1 — — 1 symbols are 

recoded at R and transmitted to B. Initially, when too 
few symbols have been accumulated at R, it is pointless 
to attempt recoding, which is also the case for traditional 
RLNC. Therefore, the remaining missing symbols will be 
re-encoded after R has achieved full rank. 


suitable, see Equation dl^ . 



(16) 

The rest of recoding is performed as re-encoding after 
B has obtained full rank. 


0- 

Fig. 8: A simple multihop scenario. 


Fre—encode(^;^max) — 1 F^passive/^) 

^max 

X] f’active(r,/i, A) (17) 

A—fi 


We define the parameter 1 < n < g which specifies 
the minimum number of symbols that should be com¬ 
bined to create a recoded symbol. In traditional RLNC, 
typically all received symbols are combined at random, 
thus /i w ?"-(l — ^)-u When R combines p symbols 
to create a recoded symbol, the probability that one or 
more symbols previously not seen at B is included in the 
recoded symbol can be expressed as in Equation llT4t . 

^unseen < 1 " (1 " (14) 

Eor e = 0.3 UMI and Punseen ~ 0.99, p = 12. When 
a new symbol is received at R, decoding is attempted 
and with some probability p or more row operations are 
performed on the symbol in which case the resulting 
inserted symbol has been successfully passively recoded. 
This probability, Ppassive/ is found as the probability that 
a random sequence of p symbols is non-zero, where the 
probability of each symbol can be calculated from r and 
9- 


Pn. 


3® h) 


r ^ r - 1 ^ ^ r-{p-l) 

g g-l ■■■ g-{p-l) 


n 


r — i 
9-'i‘ 


(15) 


Einally, we sum over r = [1,^] for Ppassive, Pactive, 
and Pre-encode to obtain the distribution of the recoded 
output symbols. Passive recoding is preferred over ac¬ 
tive recoding, and active recoding over a smaller range 
is desirable. The resulting distribution is illustrated in 
Fig. |9] The x-axis denotes the maximum A and the 
y-axis denotes the expected ratio of recoded packets. 
Probabilities for passive recoding, active recoding, and 
re-encoding are shown. 
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Otherwise active recoding becomes necessary, and is 
possible if p pivot elements have been identified in some 
range of size A. The symbols in this range are combined 
at random and the resulting symbol will have a width 
w' < w + A. The maximal width accepted is denoted 
wJnax = w + Aniax- Here we assume that A^ax = 2 ■ p in 
order to permit some freedom during recoding. 

Consider a range of size A, the number of ranges 
where at least one pivot element is zero is defined as 
((^) (®^)) ■ From this and the total number of 
combinations (®), the probability that a range contains p 
or more pivots can be found. As there are g such ranges 
we can find the probability that at least one range is 

1. In general, the necessary amount of recoding depends on the 
network topology, and more specifically on the correlation of incoming 
links at nodes that receive recoded symbols. 


Fig. 9: Distribution of A for generated recoded symbols 
for g = 512, w = 48, cra = 0.3, p = 12, and Amax = 2- p. 

For 30% of the symbols, whereof 20% is re-encoded, 
w' = w and the generated symbols are indistinguishable 
from symbols encoded at a source. For a third of the 
generated symbols w' = 48 -I- 12 = 60 is slightly larger 
than ic = 40. The remaining generated symbols have w' 
in the range [61,72]. This demonstrates that most times a 
recoded symbol can be generated and that the expected 
A is low. This is important in order to insure a low delay 
at R and fast decoding at B. 

For lower g, recoding becomes more difficult as w' 
approaches g. However, more advanced decoding al¬ 
gorithms can be employed to reduce the added A, 
unfortunately space does not permit the inclusion of 
these. 
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5 Conclusion 

In this paper, we presented our initial findings on per¬ 
petual codes which are suitable for RLNC. We described 
how encoding, decoding, and recoding can be performed 
and listed the necessary algorithms. We provided initial 
analysis of the code performance in terms of overhead 
and complexity The analytical results were compared 
with measurements obtained from our C++ implemen¬ 
tation from which we also obtained coding throughput 
measurements. 

The analysis and the tests showed that the proposed 
approach can obtain a coding overhead similar to RLNC, 
but at a much lower computational cost. For all tested 
settings resulting in a code overhead similar to that of 
RLNC, the proposed approach led to improved encoding 
and decoding throughput. For the highest tested gener¬ 
ation size of 2048, the decoding throughput was almost 
one order of magnitude higher than that of RLNC. 
Additionally, the approach provides an easily adjustable 
parameter that allows for a trade-off between coding 
complexity and code overhead. 

Throughout this work, we have compared the pro¬ 
posed perpetual code with RLNC and SRLNC and not 
with other rateless FEC codes. The reason is that NC 
codes allow for recoding where traditional FEC codes do 
not, thus they are less suitable for cooperative networks. 
Compared to RLNC and SRLNC, the perpetual code 
provides the following benefits depending on the chosen 
values of g and w: 

1) Faster encoding, recoding, and decoding. 

2) Sparsity is retained when recoding. 

3) Small coding vector representation. 

4) Simple decoding algorithms. 

As the code is sparse, fast encoding is trivial. Fast 
decoding is possible due to the structure of the code that 
helps to avoid fill-in during decoding. Recoding can be 
performed fast by using the suggested passive recoding 
approach. We note that this trick can also be employed 
for other NC codes. 

The structure and density of the coded packets can 
be retained if recoding is performed more carefully than 
what has been proposed for standard RLNC, we have 
denoted this active recoding. We note that doing so limits 
the degrees of freedom when recoding, but we believe 
that our proposed passive plus active recoding presents a 
good trade-off between these two approaches. 

As the location of the non-zero elements are well 
defined, it is trivial to create compact representations 
of the coding vectors. We believe that this is important 
as the commonly used assumption of a pseudo random 
function can be used to compress the coding vector 
carmot be used if recoding is to be supported | |2T| . Thus 
the size of the coding vector must be included in the 
total overhead. 

The presented decoding algorithms are slightly more 
complicated than for standard RLNC. However, due to 
the sparsity and structure of the coded symbols, it is 


possible to eliminate many of the inspections that are 
necessary when inverting the coding matrix. 

We have only considered the random encoding mode, 
meaning that the pivot element is always drawn at ran¬ 
dom and independently of the previous pivots, see Ta¬ 
ble |2l This corresponds to the worst-case where the chan¬ 
nel is extremely lossy and thus systematic approaches 
are of no benefit. In cases where the erasure probability is 
low or moderate, a systematic or sequential mode could 
be used which would decrease the code overhead and 
in particular the decoding complexity. 

For the future, more rigorous analysis of the code 
overhead is necessary, especially for the case where low 
values of w are used. Such analysis would be useful 
when more advanced variants of the perpetual code are 
studied. For our implementation, we plan to perform 
tests using higher field sizes and perform benchmarks 
on mobile devices. This could help to understand how 
to choose optimal parameters and to demonstrate the 
validity of the proposed solution on mobile phones and 
tablets. 
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